Category Archives: Google Cloud Platform Blog

Product updates, customer stories, and tips and tricks on Google Cloud Platform

Google and Facebook share proposed new Open Rack Standard with 48-volt power architecture



Since joining OCP earlier this year, Google has been actively collaborating with Facebook around the new Open Rack Standard. Together we’ve been working with the Open Compute Project through the OCP Incubation Committee, and today we’re pleased to share our Open Rack v2.0 Standard. The proposed v2.0 standard will specify a 48V power architecture with a modular, shallow-depth form factor that enables high-density deployment of OCP racks into data centers with limited space.
Google developed a 48V ecosystem with payloads utilizing 48V to Point-of-Load technology and has extensively deployed these high-efficiency, high-availability systems since 2010. We have seen significant reduction in losses and increased efficiency compared to 12V solutions. The improved SPUE with 48V has saved Google millions of dollars and kilowatt hours.

Our contributions to the Open Rack Standard are based on our experiences advancing the 48V architecture both with our internal teams as well as industry partners, incorporating the design expertise we've gained over the years.

In addition to the mechanical and electrical specifications, the proposed new Open Rack Standard V2.0 builds on the previous 12V design. It takes a holistic approach including details for the design of 48V power shelves, high-efficiency rectifiers, rack management controllers and rack-level battery backup units.

We've shared these designs with the OCP community for feedback, and will submit them to the OCP Foundation later this year for review. We’re looking forward to presenting the proposed standard to the OCP Engineering Workshop, August 10 at the University of New Hampshire.

If accepted, these standards will be Google’s first contributions to the OCP community, with the goal of bridging the transition from 12V to 48V architecture with ready-to-use deployment solutions for 48V payloads. We look forward to continued collaboration with adopters and contributors as we continue to develop new technologies and opportunities.

Automate deployments and traffic splitting with the App Engine Admin API



Google App Engine provides you with easy ways to manage your application from the Google Cloud Platform Console or the command line. However, there are situations when you need to manage your application programmatically. Perhaps you need to deploy to App Engine from your own custom tool chain, or you want to write your own A/B testing framework.

The App Engine Admin API lets you do all these things and more, and we're happy to announce that the API is now generally available.

You can use the Admin API not only to deploy new versions and manage traffic for any service, but also to change various configuration settings of your application, such as instance class. You can also stop individual versions in order to scale your App Engine Flexible environment deployments to zero. Finally, the API allows you to deploy several App Engine services in parallel, speeding up your deployments.

You can use the Google APIs explorer to easily test-drive the API and get a feel for what it offers.


Usage example

Let’s return to the earlier scenario: imagine that you’re writing a script to deploy a new version of your application, test it with 5% of production traffic and finally gradually shift the rest of the traffic to the new version. Let’s walk through the basic steps here; you’ll find the full instructions in our Getting started guide.

To deploy a version, you’ll generally follow these steps:
  1. Stage your application resources to a cloud storage bucket
  2. Convert your app.yaml file to a json manifest
  3. Send an http post request to the Admin API to create the new version
For this example, we’ll deploy a version for which the source code has already been staged.
First, create a file called “helloworld.json” with the following contents:

{
  "deployment": {
    "files": {
      "main.py": {
        "sourceUrl": "https://storage.googleapis.com/admin-api-public-samples/hello_world/main.py"
      },
    }
  },
  "handlers": [
    {
      "script": {
        "scriptPath": "main.app"
      },
      "urlRegex": "/.*"
    }
  ],
  "runtime": "python27",
  "threadsafe": true,
  "id": "appengine-helloworld",
  "inboundServices": [
    "INBOUND_SERVICE_WARMUP"
  ]
}


Next, send an http post request to the Admin API to create the new version:

POST 
https://appengine.googleapis.com/v1/apps/my-application/services/def
ault/versions helloworld.json


(To actually send this request, you'll need to set up authentication tokens; the Getting started guide contains the full steps.)

The response will contain the ID of a long-running operation that you can then poll to identify when the deployment has completed.

To split traffic between versions, first create another version. Follow the steps above to deploy a “appengine-goodbyeworld” version using this json manifest:

{
  "deployment": {
    "files": {
      "main.py": {
        "sourceUrl": "https://storage.googleapis.com/admin-api-public-samples/goodbye_world/main.py"
      },
    }
  },
  "handlers": [
    {
      "script": {
        "scriptPath": "main.app"
      },
      "urlRegex": "/.*"
    }
  ],
  "runtime": "python27",
  "threadsafe": true,
  "id": "appengine-goodbyeworld",
  "inboundServices": [
    "INBOUND_SERVICE_WARMUP"
  ]
}


Once the version is successfully deployed, route 50% of traffic to it with the following request:

PATCH 
https://appengine.googleapis.com/v1/apps/my-application/services/def
ault/?updateMask=split { "split": { "shardBy": "IP", "allocations { 
"appengine-helloworld": 0.5, "appengine-goodbyeworld": 0.5 } } }

Now go visit your application at http://.appspot.com. As you reload the page, you'll see the contents change depending on which version your request got routed to.

When it’s time to make the new version the primary one, you can use App Engine’s Traffic Migration feature to gradually shift all traffic as quickly as possible while giving new instances sufficient time to warm up:


PATCH 
https://appengine.googleapis.com/v1/apps/my-application/services/def
ault/?updateMask=split&migrateTraffic=true {"split": { "shardBy": "IP", "allocations": { 
<"appengine-goodbyeworld": 1 } } }


More information

The App Engine Admin API documentation contains full instructions on how to use the API, including how to authenticate to the API, deploy versions and set traffic splits.

We hope the Admin API simplifies your day-to-day workflows by letting you manage your App Engine application from the tools you already use.

Building immutable entities into Google Cloud Datastore



Editor's note: Today, we hear from Aleem Mawani, co-founder of Streak.com, a Google Cloud Platform customer whose customer relationship management (CRM) for Google Apps is built entirely on top of Google products: Gmail, Google App Engine and Google Cloud Datastore. Read on to learn how Streak added advanced functionality to the Cloud Datastore object storage system


Streak is a full blown CRM built directly into Gmail. We’re built on Google Cloud Platform (most heavily on Google App Engine) and we store terabytes of user data in Google Cloud Datastore. It’s our primary database, and we’ve been happy with its scalability, consistent performance and zero-ops management. However, we did want more functionality in a few areas. Instead of overwriting database entities with their new content whenever a user updated their data, we wanted to store every version of those entities and make them easy to access. Specifically, we wanted a way to make all of our data immutable.

In this post, I’ll go over why you might want to use immutable entities, and our approach for implementing them on top of Cloud Datastore.

There are a few reasons why we thought immutable entities were important.

  1. We wanted an easy way to implement a newsfeed-style UI. Typical newsfeeds show how an entity has changed over time in a graphical format to users. Traditionally we stored separate side entities to record the deltas between different versions of a single entity. Then we’d query for those side entities to render a newsfeed. Designing these side entities was error prone and not easily maintainable. For example, if you added a new property to your entity, you would need to remember to also add that to the side entities. And if you forgot to add certain data to the side entities, there was no way to reconstruct that later down the line when you did need it  the data was gone forever.
  2. The "Contact" entity stores data about users’ contacts. Because it’s implemented as an immutable entity, it's easy to generate a historical record of how that contact has changed over time.
  3. Having immutable entities allows us to recover from user errors very easily. Users can rollback their data to earlier versions or even recover data they may have accidentally deleted (see how we implemented deletion below)1.
  4. Potentially easier debugging. It’s often useful to see how an entity changed over time and got into its current state. We can also run historical queries on the number of changes to an entity - useful for user behaviour analysis or performance optimization.


Some context

Before we go into our implementation of immutable entities on the Cloud Datastore, we need to understand some of the basics of how the datastore operates. If you’re already familiar with the Cloud Datastore, feel free to skip this section.

You can think of the Cloud Datastore as a key-value store. A value, called an entity in the datastore, is identified by its key, and the entity itself is just a bag of properties. There's no enforcement of a schema on all entities in a table so the properties of two entities need not be the same.

The database also supports basic queries on a single table  there are no joins or aggregation, just simple table scans for which an index can be built. While this may seem limiting, it enables fast and consistent query performance because you will typically denormalize your data.

The most important property of Cloud Datastore for our implementation of immutable entities is “entity groups.” Entity groups are groups of entities for which you get two guarantees:
  1. Queries that are restricted to a single entity group get consistent results. This means that a write immediately followed by a query will have results that are guaranteed to reflect the changes made by the write. Conversely, if your query is not limited to a single entity group you may not get consistent results (stale data).
  2. Multi-entity transactions can only be applied within a single entity group (this was recently improved Cloud Datastore now supports cross entity group transactions but limits the number of entity groups involved to 25).
Both of these facts will be important in our implementation. For more details on how the Cloud Datastore itself works, see the documentation.

How we implemented immutable entities

We needed a way to store every change we made to a single entity while supporting common operations for entities: get, delete, update, create and query. The overall strategy we took was to utilize two levels of abstraction  a "datastore entity" and a "logical entity." We used individual "datastore entities" to represent individual versions of a "logical entity." Users of our API would only interact with logical entities and each logical entity would have a key to identify it and support the common get, create, update, delete and query operations. These logical entities would be backed by actual datastore entities comprising the different versions of that logical entity. The most recent, or tip, version of the datastore entities represented the current value of the logical entity. First let’s start with what the data model looks like. Here’s how we designed our entity:
(click to enlarge)
The way this works is that we always store a new datastore entity every time the user would like to make a change to the entity. The most recent datastore entity has the isTip value set to true and the rest don’t. We'll use this field later to query for a particular logical entity by getting the tip data store entity. This query is fast in the data store because all queries are required to have indexes. We also store the timestamp for when each datastore entity was created.

The versionId field is a globally unique identifier for each datastore entity. These IDs are automatically assigned by Cloud Datastore when we store the entity.

The consistentId identifies a logical entity  it's the ID we can give to users of this API. All of the datastore entities in a logical entity have the same consistent ID. We picked the consistent ID of the logical entity to be equal to the ID of the first datastore entity in the chain. This is somewhat arbitrary, and we could have picked any unique identifier, but since the low level Cloud Datastore API gives us a unique ID for every datastore entity, we decided to use the first one as our consistent ID.

The other interesting part of this data model is the firstEntityInChain field. What's not shown in the diagram is that every datastore entity has its parent (the parent determines the entity group) set to the first datastore entity in the chain. It's important that all the datastore entities in the chain (including the first one) have the same parent and are thus in the same entity group so that we can perform consistent queries. You’ll see why these are needed below.

Here’s the same immutable entity defined in code. We use the awesome Objectify library with the Cloud Datastore and these snippets do make use of it.

public class ImmutableDatastoreEntity {
@Id
Long versionId;
@Parent
Key<T> firstEntityInChain;
protected Long consistentId;
protected boolean isTip;
Key<User> savedByUser;
}

So how do we perform common operations on logical entities given that they are backed by datastore entities?

Performing creates

When creating a logical entity, we just need to create a single new datastore entity and use the Cloud Datastore’s ID allocation to set the versionId field and the consistentId field to the same value. We also set the parent key (firstEntityInChain) to point to itself. We also have to set isTip to true so we can query for this entity later. Finally we set the timestamp and the creator of the datastore entity and persist the entity to Cloud Datastore.

ImmutableDatastoreEntity entity = new ImmutableDatastoreEntity();
entity.setVersionId(DAO.allocateId(this.getClass()));
entity.setConsistentId(entity.getVersionId());
entity.setFirstEntityInChain((Key<T>) Key.create(entity.getClass(), entity.versionId));
entity.setTip(true);


Performing updates

To update a logical entity with new data, we first need to fetch the most recent datastore entity in the chain (we describe how in the "get" section below). We then create a new datastore entity and set the consistentId and firstEntityInChain to that of the previous datastore entity in the chain. We set isTip to true on the new datastore entity and set it to false on the old datastore entity (note this is the only instance in which we modify an existing entity so we aren’t 100% immutable). 

We finally fill in the timestamp and user keys fields, and we’re ready to store the new datastore entity. Two important points on this: for the new datastore entity, we can let the datastore automatically allocate the ID when storing the entity (because we don’t need to use it anywhere else). Second, it's incredibly important that we fetch the existing datastore entity and store both the new and old datastore entity in the same transaction. Without this, our data could become internally inconsistent.



// start transaction
ImmutableDatastoreEntity oldVersion = getImmutableEntity(immutableId)
oldVersion.setTip(false);
ImmutableDatastoreEntity newVersion = oldVersion.clone();
// make the user edits needed
newVersion.setVersionId(null);
newVersion.setConsistentId(this.getConsistentId());
newVersion.setFirstEntityInChain(oldVersion.getFirstEntityInChain());
// .clone also performs the last two lines but just to be explicit this, just fyi
newVersion.setTip(true);
ofy().save(oldVersion, newVersion).now();
// end transaction

Performing gets

Performing a get actually requires us to do a query operation to the datastore because we need to find the datastore entity that has a certain consistentId AND has isTip set to true. This entity will represent the logical entity. Because we want the query to be consistent, we must perform an ancestor query (i.e., tell Cloud Datastore to limit the query to a certain entity group). This only works because we ensured that all datastore entities for a particular logical entity are part of the same entity group.

This query should only ever return one result 
 the datastore entity that represents the logical entity.


Key ancestorKey = KeyFactory.createKey(ImmutableDatastoreEntity.class, consistentId);
ImmutableDatastoreEntity e = ofy().load()
.kind(ImmutableDatastoreEntity.class)
.filter("consistentId", consistentId)
.filter("isTip", true)
.ancestor(ancestorKey) // this limits our query to just the 1 entity group
.list()
.first();

Performing deletes

In order to delete logical entities, all we need to do is set the isTip of the most recent datastore entity to false. By doing this we ensure that the "get" operation described above no longer returns a result, and similarly, queries such as those described below continue to operate.


// wrap block in transaction
ImmutableDatastoreEntity oldVersion = getImmutableEntity(immutableId);
oldVersion.setTip(false);
ofy().save(oldVersion, newVersion).now();

Performing queries

We need to be able to perform queries across all logical entities. However, when querying every datastore entity, we need to modify our queries so that they only consider the tip datastore entity of each logical entity (unless you explicitly want to find old versions of the data). To do this, we need to add an extra filter to our queries to just consider tip entities. One important thing to note is that we cannot do consistent queries in this case because we cannot guarantee that all the results will be in the same entity group (in fact we know for certain they are not if there are multiple results)


List<ImmutableDatastoreEntity> results = ofy().load()
.kind(ImmutableDatastoreEntity.class)
.filter("isTip", true)
.filter(/** apply other filters here */)
.list();

Performing newsfeed queries

One of our goals was to be able to show how a logical entity has changed over time, so we must be able to query for all datastore entities in a chain. Again, this is a fairly straightforward query  we can just query by the consistentId and order by the timestamp. This will give us all versions of the logical entity. We can diff each datastore entity against the previous datastore entity to generate the data needed for a newsfeed.

Key ancestorKey = KeyFactory.createKey(ImmutableDatastoreEntity.class, consistentId);
List<ImmutableDatastoreEntity> versions = ofy().load()
.kind(ImmutableDatastoreEntity.class)
.filter("consistentId", consistentId)
.ancestor(ancestorKey)
.list();

Downsides

Using the design described above, we were able to achieve our goal of having roughly immutable entities that are easy to debug and make it easy to build newsfeed-like features. However, there are some drawbacks to this method:
  1. We need to do a query any time we need to get an entity. In order to get a specific logical entity, we actually need to perform a query as described above. On Cloud Datastore, this is a slower operation than a traditional "get" by key. Additionally, Objectify offers built-in caching, which also can’t be used when trying to get one of our immutable entities (because Objectify can’t cache queries). To address this, we’ll need to implement our own caching in memcache if performance becomes an issue.
  2. There's no method to do a batch get of entities. Because each query must be restricted to a single entity group for consistency, we can’t fetch the tip datastore entity for multiple logical entities with just one datastore operation. To address this, we perform multiple asynchronous queries and wait for all to finish. This isn’t ideal or clean, but it works fairly well in practice. Remember that on App Engine there's a limit of 30 outstanding RPCs when making concurrent RPC calls, so this only takes you so far.
  3. High implementation cost for the first entity. We abstracted most of the design described above so that future immutable entities would be cheap for us to implement, however, the first entity wasn’t trivial to implement. It took us some time to iron out all the kinks, so it’s definitely only worth doing this if you very much need immutability or if you’ll be spreading the implementation cost across many use cases.
  4. Entities are never actually deleted. By design, we don’t delete immutable entities. However, from a user perspective, they may have the expectation that once they delete something in our app,  we actually delete the data. This also might be the expectation in some regulated industries (i.e., healthcare). For our use case, it wasn't necessary, but you may want to develop a system that maps over your dataset and finds fully deleted logical entities and deletes all of the datastore entities representing them in some batch task periodically. 

Next steps

We’ve only been running with immutable entities in production for a little while, and it remains to be seen what problems we’ll face. And as we implement a few more of our datasets as immutable entities, it will become clear whether the implementation costs were worth the effort. Subscribe to our blog to get updates.

If this sort of data infrastructure floats your boat, definitely reach out to us as we have several openings on our backend team. Check out our job postings for more info.

Discuss on Hacker News



1This is very similar to the idea of MVCC (https://en.wikipedia.org/wiki/Multiversion_concurrency_control) which is how many modern databases implement transactions and rollback.


Running the same, everywhere part 2: getting started



In part one of this post, we looked at how to avoid lock-in with your cloud provider by selecting open-source software (OSS) that can run on a variety of clouds. Sounds good in theory, but I can hear engineers and operators out there saying, “OK, really, how do I do it?”

Moving from closed to open isn’t just about knowing the names of the various OSS piece-parts and then POOF!  you're magically relieved of having to make tech choices for the next hundred years. It’s a process, where you choose more and more open systems and gradually gain more power.

Let’s assume that you’re not starting from scratch (if you are, please! Use the open tools we’ve described here as opposed to more proprietary options). If you’ve already built an application that consumes some proprietary components, the first step is to prioritize migration from those components to open alternatives. Of course, this starts with knowing about those alternatives (check!) and then following a given product’s documentation for initialization, migration and operations.

But before we dive into specific OSS components, let’s put forth a few high-level principles.

  1. Applications that are uniformly distributed across distinct cloud providers can be complex to manage. It’s often substantially simpler and more robust to load-balance entirely separate application systems than it is to have one globally conjoined infrastructure. This is particularly true for any services that store state, such as storage and database tools; in many cases, setting up replication across providers for HA is the most direct path to value.
  2. The more you can minimize the manual work required to relocate services from one system to another, the better. This of course can require very nuanced orchestration and automation, and its own sets of skills. Your level of automated distribution may vary between different layers of your stack; most companies today can get to “apps = automated” and “data = instrumented” procedures relatively easily, but “infra = automated” might take more effort.
  3. No matter how well you think migrating these systems will work, you won’t know for sure until you try. Further, migration flexibility atrophies without regular exercise. Consider performing regular test migrations and failovers to prove that you’ve retained flexibility.
  4. Lock-in at your “edges” is easier to route around or resolve than lock-in at your “core.” Consider open versions of services like queues, workflow automation, authentication, identity and key management as particularly critical.
  5. Consider the difference in kind between “operational lock-in” versus “developer lock-in.” The former is painful, but the latter can be lethal. Consider especially carefully the software environments you leverage to ensure that you avoid repetitive work.


Getting started

With that said, let’s get down to specifics and look at the various OSS services that we recommend when building this kind of multi-cloud environment.

If you choose Kubernetes for container orchestration, start off with a Hello World example, take an online training course, follow setup guides for Google Container Engine and Elastic Compute Cloud (EC2), familiarize yourself with the UI, or take the docker image of an existing application and launch it. Perhaps you have applications that require communications between all hosts? If you’re distributed across two cloud providers, that means you’re distributed across two networks, and you’ll likely want to set up VPN between the two environments to keep traffic moving. If it’s a large number of hosts or a high-bandwidth interaction, you can use Google Cloud Interconnect.

If you’re using Google App Engine and AppScale for platform-as-a-service, the process is very similar. To run on the Google side, follow App Engine documentation, and for AppScale in another environment, follow their getting started guide. If you need cross-system networking, you can use VPN or for scaled systems  Cloud Interconnect.

For shops running HBase and Google Cloud BigTable as their big data store, follow the Cloud Bigtable cluster creation guide for the Cloud Platform side, and the HBase quickstart (as well as longer form not-so-quick-start guides). There’s some complexity in importing data from other sources into an HBase-compatible system; there’s a manual for that here.

The Vitess NoSQL database is an interesting example, in that the easiest way to get started with this is to run it inside of the Kubernetes system we built above. Instructions for that are here, the output of which is a scalable MySQL system.

For Apache Beam/Cloud Dataflow batch and stream data processing, take a look at the GCP documentation to learn about the service, and then follow it up with some practical exercises in the How-to guides and Quickstarts. You can also learn more about the open source Apache Beam project on the project website.

For TensorFlow, things couldn’t be simpler. This OSS machine learning library is available via Pip and Docker, and plays nicely with Virtualenv and Anaconda. Once you’ve installed it, you can get started with Hello TensorFlow, or other tutorials such as MNIST For ML Beginners, or this one about state of the art translation with Recurrent Neural Nets.

The Minio object storage server is written in Golang, and as such, is portable across a wide variety of target platforms, including Linux, Windows, OS X and FreeBSD. To get started, head over to their Quickstart Guide.

Spinnaker is an open-source continuous delivery engine that allows you to build complex pipelines that take your code from a source repository to production through a series of stages   for example, waiting for code to go through unit testing and integration phases in parallel before pushing it to staging and production. In order to get started with continuous deployment with Spinnaker, have a look at their deployment guide.

But launching and configuring these open systems is really just the beginning; you’ll also need to think about operations, maintenance and security management, whether they run in a single- or multi-cloud configuration. Multi-cloud systems are inherently more complex, and the operational workflow will take more time.

Still, compared to doing this at any previous point in history, these open-source tools radically improve businesses’ capacity to operate free of lock-in. We hear from customers every day that OSS tools are an easy choice, particularly for scaled, production workloads. Our goal is to partner with customers, consultancies and the OSS community of developers to extend this framework and ensure this approach succeeds. Let us know if we can help you!

Cloud Shell now GA, and still free



Google Cloud Shell is a command line interface that allows you to manage your Google Cloud Platform infrastructure from any computer with an internet connection. Last year we extended the free beta period through the end of 2016 so you could try it out longer. Now, we’re excited to announce that Cloud Shell is generally available and free to use.

For those of you who haven’t tried it yet, Cloud Shell offers quick access to a temporary VM that's hosted and managed by Google and includes all the popular tools that you need to manage your GCP environment. For example, you can use the Cloud SDK to manage Cloud Storage data or run and deploy an App Engine application. You can keep files between sessions using a personal 5GB of storage space.
Cloud Shell provides a resizable window inside of the Cloud Console (click to enlarge)

To open Cloud Shell from the Cloud Console, simply click on the Cloud Shell icon in the top-right corner.

The Cloud Shell documentation has a variety of tutorials to help you get started. In addition, here are a few pro-tips:
  • To switch to a light theme, look under the gear icon
  • Cloud Shell supports the terminal multiplexer tmux. Toggle it on or off from Cloud Shell to use different options in various Cloud Console tabs.
  • To pop out the entire console window, click the pop out icon
As always, send us feedback using the "Send Feedback" link in the top right of the Cloud Console or within Cloud Shell under the gear icon. We’re excited to see how you use Cloud Shell and how we can make it even more useful.

How Google protects your data: Customer-Supplied Encryption Keys for Compute Engine goes GA!



Control over data or agility of the cloud? Why not both? We are pleased to announce that Customer-Supplied Encryption Keys (CSEK) for Compute Engine is now generally available, allowing you to take advantage of the cloud while protecting your Google Compute Engine disks with keys that you control.

Google Cloud Platform automatically encrypts customer content stored at rest, including all Compute Engine disks, using one or more encryption mechanisms. We use encryption to help keep your data private and secure. You can learn more by reading our whitepaper, “Encryption at Rest in Google Cloud Platform,” which takes an in-depth look at encryption at rest across Cloud Platform.

With CSEK, disks at rest are protected with your own key that cannot be accessed by anyone, inside or outside of Google, unless they present your key. Google does not retain your keys and only holds them transiently to fulfill your request, such as attaching a disk or starting a VM.

We designed Customer-Supplied Encryption Keys to be secure, fast and easy.

Customer-supplied encryption keys give us the fidelity and granular control to provide strong data-protection assurances to our customers. It's a critical feature and Google's approach is key to our end-to-end security posture.                                                                                                                                                                                                   - Neil Palmer, CTO, Advanced Technology at FIS Global 
Customer-supplied keys have integrated seamlessly with the fully automated Kubernetes pods and projects that drive Kensho's machine intelligence platform on GCP.                                                                                                                                                                                                   - Matt Taylor, CTO at Kensho

Customer-Supplied Encryption Keys for Compute Engine is available in select countries. Later this month, we’re expanding to Australia, Italy, Mexico, Norway, and Sweden. If your organization needs this capability and you don’t see your country listed, please let us know. We use your input to determine where it becomes available next.

See you in the cloud!


Updated and expanded: Google Cloud Platform for AWS Professionals

Last year, we published a guide for our customers who had familiarity and expertise with AWS but wanted to learn how it compares to Google Cloud Platform. The guide had a really positive reception, helping customers understand things like how Cloud Platform delivers Infrastructure as a Service with Google Compute Engine and how our VPN works.

Today, we're happy to announce a major expansion to the Cloud Platform for AWS Professionals guide, with new sections covering Big Data services, Storage services and Containers as a Service (Google Container Engine).
Amazon ECS vs. Google Container Engine at a glance
How Amazon Elastic MapReduce compares to Google Cloud Dataproc and Cloud Dataflow
As we said last year, this guide is a work-in-progress. We have some ideas about what topics we’d like to tackle next (services like Databases and Development tools) but we’d also love to hear what you think we should cover.

We hope you find this information useful and makes learning about Cloud Platform enjoyable. Please tell us what you think, and be sure to sign up for a free trial!

Save money, improve performance with new VM Rightsizing Recommendations



Here at Google Cloud Platform, we pride ourselves on efficiency and reducing waste, from our datacenters all the way to individual virtual machine instances. We want to make sure that our users are getting good value for their money. To that end, VM Rightsizing Recommendations is now in beta.

Knowing which VM machine type to choose before you start running your workloads is challenging, especially given that your load may change over time. Select a machine that’s too large and you overpay. Select a machine that’s too small and your service is starved for resources. And while it’s possible to split some workloads amongst identical machines and use Managed Instance Group Autoscaler to balance resources, that doesn’t work for all workloads. Some workloads such as databases or file servers can’t be easily distributed.

VM Rightsizing Recommendations can help you see at a glance if your machines are the right size for the work that you assigned them. It monitors your CPU and RAM usage over time and makes recommendations about the size of your VMs. When applicable, it estimates how much you could save or whether your instances are overloaded and displays that information right on the VM Instances page — just look for the light bulb!
The VM instances page shows instances where you can save money or increase performance as well as estimated savings per instance in total (click to enlarge)

When you’re ready, you can resize your VMs with a single click. And if your workload changes and your machine type is no longer a fit, we’ll let you know.

For more information about the VM Rightsizing Recommendations beta, including how we come up with the recommendations, click here.

What if you could run the same, everywhere?



Is multi-cloud a pipe dream? I think not!

From startups to enterprises, despite material increases in efficiency and the price to performance ratio of the compute, network and storage resources we all use, infrastructure continues to come at substantial cost. It can also be a real risk driver; each implementation choice affects future scalability, service level and flexibility of the services being built. It’s fair to say that “future-proofing” should be the primary concern of every system architect.

Providers of infrastructure aren’t disinterested actors either; there are huge incentives for any vendor to increase lock-in through contractual, fiscal and technical constrictions. In many cases, interest in cloud infrastructure, particularly existing consumers of infrastructure, has been driven by a huge urge to break free of existing enterprise vendor relationships for which the lock-in costs are higher than the value provided. Once they have some kind of lock-in working, infrastructure companies know that they can charge higher rents without necessarily earning them.

So, how can you swing the power dynamic around so that you, as the consumer of infrastructure, get the most value out of your providers at the lowest cost?

A good first step is to actively resist lock-in mechanisms. Most consumers have figured out that long-term contractual commitments can be dangerous. Most have figured out that pre-paid arrangements distort decisionmaking and can be dangerous. Technical lock-in remains one of the most difficult to avoid. Many providers wrap valuable differentiated services in proprietary APIs so that applications eventually get molded around their design. These “sticky services” or “loss leaders” create substantial incentives for tech shops to take the shorter path to value and accept a bit of lock-in risk. This is a prevalent form of technical debt, especially when new vendors release even more powerful and differentiated tools in the same space, or when superior solutions rise out of OSS communities.

In the past, some companies tried to help users get out from under this debt by building abstraction layers on top of the proprietary APIs from each provider, so that users could use one tool to broker across multiple clouds. This approach has been messy and fragile, and tends to compromise to the lowest-common denominator across clouds. It also invites strategic disruption from cloud providers in order to preserve customer lock-in.

Open architectures

Thankfully, this isn’t the only way technology works. It’s entirely possible to efficiently build scaled, high performance, cost-efficient systems without accepting unnecessary technical lock-in risk or tolerating the lowest-common denominator. You can even still consume proprietary infrastructure products, as long as you can prove to yourself that because those products expose open APIs, you can move when you want to. This is not to say that this isn’t complex, advanced work. It is. But the amount of time and effort required is shrinking radically every day. This gives users leverage; as your freedom goes up, it becomes easier and easier to treat providers like the commodities they ought to be.

We understand the value of proprietary engineering. We’ve created a purpose built cloud stack, highly tuned for scale, performance, security, and flexibility. We extract real value from this investment, through our advertising, applications as well as our cloud businesses. But GCP, along with some other providers and members of the broader technology community, recognize that when users have power, they can do powerful things. We’ve worked hard to deliver services that are differentiated by their performance, stability and cost, but not by proprietary, closed APIs. We know this means that you can stop using us when you want to; we think that gives you the power to use us at lower risk. Some awesome folks have started calling this approach GIFEE or “Google Infrastructure For Everyone Else. But given the overwhelming participation and source code contributions — including those for kubernetes — from individuals and companies of all sizes to the OSS projects involved, it’s probably more accurate to call it Everyone’s Infrastructure, For Every Cloud — unfortunately that’s a terrible acronym.

A few salient examples:
Applications can run in containers on Kubernetes, the OSS container orchestrator that Google helped create, either managed and hosted by us via GKE, or on any provider, or both at the same time.

Kubernetes ensures that your containers aren’t locked in.





Web apps can run in a PaaS environment like AppScale, the OSS application management framework, either managed and hosted by us via Google AppEngine, or on any provider, or both at the same time. Importantly this includes the NoSQL transactional stores required by apps, either powered by AppScale, which uses Cassandra as a storage layer and vends the Google App Engine Datastore API to applications, or native in App Engine.

AppScale ensures that your apps aren’t locked in.




NoSQL k-v stores can run Apache HBase, the OSS NoSQL engine inspired by our Bigtable whitepaper, either managed and hosted by us via Cloud Bigtable, or on any other provider, or both at the same time.

HBase ensures that your NoSQL isn’t locked in.



OLAP systems can be run using Druid or Drill, two OSS OLAP engines inspired by Google’s Dremel system. These are very similar to BigQuery, and allow you to run on any infrastructure.  


Druid and Drill ensure that your OLAP system isn’t locked in.




Advanced RDBMS can be built in Vitess, the OSS MySQL toolkit we helped create, either hosted by us inside Google Container Engine, or on any provider via Kubernetes, or both at the same time. You can also run MySQL fully managed on GCP via CloudSQL.

Vitess ensures that your relational database isn’t locked in.




Data orchestration can run in Apache Beam, the OSS ETL engine we helped create, either managed and hosted by us via Cloud Dataflow, or on any provider, or both at the same time.

Beam ensures that your data ETL isn’t locked in.




Machine Learning can be built in TensorFlow, the OSS ML toolkit we helped create, either managed and hosted by us via CloudML, or on any provider, or both at the same time.

TensorFlow ensures that your ML isn’t locked in.




Object storage can be built in Minio.io which vends the S3 API via OSS, either managed and hosted by us via GCS which also emulates the S3 API, or on any provider, or both at the same time.

Minio ensures that your object store isn’t locked in.




Continuous Deployment tooling can be delivered using Spinnaker, a project started by the Netflix|OSS team, either hosted by us via GKE, or on other providers, or both at the same time.

Spinnaker ensures that your CD tooling isn’t locked in.



What’s still proprietary, but probably OK?

CDN, DNS, Load Balancing

Because the interfaces to these kinds of services are network configurations rather than code, so far these have remained proprietary across providers. NGINX and Varnish make excellent OSS load balancers/front-end caches, but because of the low friction low risk switchability, there’s no real need to avoid DNS or LB services on public clouds. Now, if you’re doing some really dynamic stuff and writing code to automate, you can use Denominator by Netflix to help keep your code connecting to open interfaces.

File Systems

These are still pretty hard for cloud providers to deliver as managed services at scale; Gluster FS, Avere, ZFS and others are really useful to deliver your own POSIX layers irrespective of environment. If you’re building inside Kubernetes, take a look at CoreOS’s Torus project.

It’s not just software, it’s the data
Lock-in risk comes in many forms, one of the most powerful being data gravity or data inertia. Even if all of your software can move between infrastructures with ease, those systems are connected by a limited, throughput constrained internet and once you have a petabyte written down, it can be a pain in the neck to move. What good is software you can move in a minute if it takes a month to move the bytes?

There are lots of tools that help, both native from GCP, and from our growing partner ecosystem.
  • If your data is in an object store, look no further than the Google Storage Transfer Service, an easy automated tool for moving your bits from A to G.
  • If you have data on tape or disk, take a look at the Offline Media Import/Export service. You might need to regularly move data to and from our cloud, so take a look at Google Cloud Interconnect for leveraging carriers or public peering points to connect reliably with us.
  • If you have VM images you’d like to move to cloud quickly we recommend Cloud Endure to move and transform your images for running on Google Compute Engine.
  • If you have a database you need replicated, take a look at Attunity CloudBeam. If you’re trying to migrate bulk data, try FDT from CERN.
  • If you’re doing data imports, perhaps Embulk.

Conclusion

We hope the above helps you choose open APIs and technologies designed to help you grow without locking you in. That said, remember that the real proof you have the freedom to move is to actually move; try it! Customers have told us about their new-found power at the negotiating table when they can demonstrably run their application across multiple providers.

All of the above mentioned tools, in combination with strong private networking between providers, allow your applications to span providers with a minimum of provider-specific implementation detail.

If you have questions about how to implement the above, about other parts of the stack this kind of thinking applies to, or about how you can get started, don’t hesitate to reach out to us at Google Cloud Platform, we’re eager to help.

What if you could run the same, everywhere?



Is multi-cloud a pipe dream? I think not!

From startups to enterprises, despite material increases in efficiency and the price to performance ratio of the compute, network and storage resources we all use, infrastructure continues to come at substantial cost. It can also be a real risk driver; each implementation choice affects future scalability, service level and flexibility of the services being built. It’s fair to say that “future-proofing” should be the primary concern of every system architect.

Providers of infrastructure aren’t disinterested actors either; there are huge incentives for any vendor to increase lock-in through contractual, fiscal and technical constrictions. In many cases, interest in cloud infrastructure, particularly existing consumers of infrastructure, has been driven by a huge urge to break free of existing enterprise vendor relationships for which the lock-in costs are higher than the value provided. Once they have some kind of lock-in working, infrastructure companies know that they can charge higher rents without necessarily earning them.
So, how can you swing the power dynamic around so that you, as the consumer of infrastructure, get the most value out of your providers at the lowest cost?

A good first step is to actively resist lock-in mechanisms. Most consumers have figured out that long-term contractual commitments can be dangerous. Most have figured out that pre-paid arrangements distort decisionmaking and can be dangerous. Technical lock-in remains one of the most difficult to avoid. Many providers wrap valuable differentiated services in proprietary APIs so that applications eventually get molded around their design. These “sticky services” or “loss leaders” create substantial incentives for tech shops to take the shorter path to value and accept a bit of lock-in risk. This is a prevalent form of technical debt, especially when new vendors release even more powerful and differentiated tools in the same space, or when superior solutions rise out of OSS communities.

In the past, some companies tried to help users get out from under this debt by building abstraction layers on top of the proprietary APIs from each provider, so that users could use one tool to broker across multiple clouds. This approach has been messy and fragile, and tends to compromise to the lowest-common denominator across clouds. It also invites strategic disruption from cloud providers in order to preserve customer lock-in.

Open architectures

Thankfully, this isn’t the only way technology works. It’s entirely possible to efficiently build scaled, high performance, cost-efficient systems without accepting unnecessary technical lock-in risk or tolerating the lowest-common denominator. You can even still consume proprietary infrastructure products, as long as you can prove to yourself that because those products expose open APIs, you can move when you want to. This is not to say that this isn’t complex, advanced work. It is. But the amount of time and effort required is shrinking radically every day. This gives users leverage; as your freedom goes up, it becomes easier and easier to treat providers like the commodities they ought to be.

We understand the value of proprietary engineering. We’ve created a purpose built cloud stack, highly tuned for scale, performance, security, and flexibility. We extract real value from this investment, through our advertising, applications as well as our cloud businesses. But GCP, along with some other providers and members of the broader technology community, recognize that when users have power, they can do powerful things. We’ve worked hard to deliver services that are differentiated by their performance, stability and cost, but not by proprietary, closed APIs. We know this means that you can stop using us when you want to; we think that gives you the power to use us at lower risk. Some awesome folks have started calling this approach GIFEE or “Google Infrastructure For Everyone Else. But given the overwhelming participation and source code contributions — including those for kubernetes — from individuals and companies of all sizes to the OSS projects involved, it’s probably more accurate to call it Everyone’s Infrastructure, For Every Cloud — unfortunately that’s a terrible acronym.
Links: AppScale, Kubernetes, Apache HBase, Druid, Apache Drill, Vitess, Apache Beam, TensorFlow, Minio.io, Spinnaker

A few salient examples:
Applications can run in containers on Kubernetes, the OSS container orchestrator that Google helped create, either managed and hosted by us via GKE, or on any provider, or both at the same time.

Kubernetes ensures that your containers aren’t locked in.





Web apps can run in a PaaS environment like AppScale, the OSS application management framework, either managed and hosted by us via Google AppEngine, or on any provider, or both at the same time. Importantly this includes the NoSQL transactional stores required by apps, either powered by AppScale, which uses Cassandra as a storage layer and vends the Google App Engine Datastore API to applications, or native in App Engine.

AppScale ensures that your apps aren’t locked in.




NoSQL k-v stores can run Apache HBase, the OSS NoSQL engine inspired by our Bigtable whitepaper, either managed and hosted by us via Cloud Bigtable, or on any other provider, or both at the same time.

HBase ensures that your NoSQL isn’t locked in.



OLAP systems can be run using Druid or Drill, two OSS OLAP engines inspired by Google’s Dremel system. These are very similar to BigQuery, and allow you to run on any infrastructure.  


Druid and Drill ensure that your OLAP system isn’t locked in.




Advanced RDBMS can be built in Vitess, the OSS MySQL toolkit we helped create, either hosted by us inside Google Container Engine, or on any provider via Kubernetes, or both at the same time. You can also run MySQL fully managed on GCP via CloudSQL.

Vitess ensures that your relational database isn’t locked in.




Data orchestration can run in Apache Beam, the OSS ETL engine we helped create, either managed and hosted by us via Cloud Dataflow, or on any provider, or both at the same time.

Beam ensures that your data ETL isn’t locked in.




Machine Learning can be built in TensorFlow, the OSS ML toolkit we helped create, either managed and hosted by us via CloudML, or on any provider, or both at the same time.

TensorFlow ensures that your ML isn’t locked in.




Object storage can be built in Minio.io which vends the S3 API via OSS, either managed and hosted by us via GCS which also emulates the S3 API, or on any provider, or both at the same time.

Minio ensures that your object store isn’t locked in.




Continuous Deployment tooling can be delivered using Spinnaker, a project started by the Netflix|OSS team, either hosted by us via GKE, or on other providers, or both at the same time.

Spinnaker ensures that your CD tooling isn’t locked in.


What’s still proprietary, but probably OK?

CDN, DNS, Load Balancing

Because the interfaces to these kinds of services are network configurations rather than code, so far these have remained proprietary across providers. NGINX and Varnish make excellent OSS load balancers/front-end caches, but because of the low friction low risk switchability, there’s no real need to avoid DNS or LB services on public clouds.

File Systems

These are still pretty hard for cloud providers to deliver as managed services at scale; Gluster FS, Avere, ZFS and others are really useful to deliver your own POSIX layers irrespective of environment. If you’re building inside Kubernetes, take a look at CoreOS’s Torus project.

It’s not just software, it’s the data
Lock-in risk comes in many forms, one of the most powerful being data gravity or data inertia. Even if all of your software can move between infrastructures with ease, those systems are connected by a limited, throughput constrained internet and once you have a petabyte written down, it can be a pain in the neck to move. What good is software you can move in a minute if it takes a month to move the bytes?

There are lots of tools that help, both native from GCP, and from our growing partner ecosystem.
  • If your data is in an object store, look no further than the Google Storage Transfer Service, an easy automated tool for moving your bits from A to G.
  • If you have data on tape or disk, take a look at the Offline Media Import/Export service. You might need to regularly move data to and from our cloud, so take a look at Google Cloud Interconnect for leveraging carriers or public peering points to connect reliably with us.
  • If you have VM images you’d like to move to cloud quickly we recommend Cloud Endure to move and transform your images for running on Google Compute Engine.
  • If you have a database you need replicated, take a look at Attunity CloudBeam. If you’re trying to migrate bulk data, try FDT from CERN.
  • If you’re doing data imports, perhaps Embulk.

Conclusion

We hope the above helps you choose open APIs and technologies designed to help you grow without locking you in. That said, remember that the real proof you have the freedom to move is to actually move; try it! Customers have told us about their new-found power at the negotiating table when they can demonstrably run their application across multiple providers.

All of the above mentioned tools, in combination with strong private networking between providers, allow your applications to span providers with a minimum of provider-specific implementation detail.

If you have questions about how to implement the above, about other parts of the stack this kind of thinking applies to, or about how you can get started, don’t hesitate to reach out to us at Google Cloud Platform, we’re eager to help.