Tag Archives: Google Cloud Platform

New GitHub repo: Using Firebase to add cloud-based features to games built on Unity



A while back, a group of us Google Cloud Platform Developer Programs Engineers teamed up with gaming fans in Firebase Engineering to work on an interesting project. We all love games, gamers, and game developers, and we wanted to support those developers with solutions that accomplish common tasks so they can focus more on what they do best: making great games.

The result was Firebase Unity Solutions. It’s an open-source github repository with sample projects and scripts. These projects utilize Firebase tools and services to help you add cloud-based features to your games being built on Unity.

Each feature will include all the required scripts, a demo scene, any custom editors to help you better understand and use the provided assets, and a tutorial to use as a step-by-step guide for incorporating the feature into your game.

The only requirements are a Unity project with the .NET 2.0 API level enabled, and a project created with the Firebase Console.

Introducing Firebase Leaderboard


Our debut project is the Firebase_Leaderboard, a set of scripts that utilize Firebase Realtime Database to create and manage a cross-platform high score leaderboard. With the LeaderboardController MonoBehaviour, you can retrieve any number of unique users’ top scores from any time frame. Want the top 5 scores from the last 24 hours? Done. How about the top 100 from last week? You got it.

Once a connection to Firebase is established, scores are retrieved automatically, including any new scores that come in while the controller is enabled.

If any of those parameters are modified (the number of scores to retrieve, or the start or end date), the scores are automatically refreshed. The content is always up-to-date!

private void Start() {
    this.leaderboard = FindObjectOfType();
    leaderboard.FirebaseInitialized += OnInitialized;
    leaderboard.TopScoresUpdated += UpdateScoreDisplay;
    leaderboard.UserScoreUpdated += UpdateUserScoreDisplay;
    leaderboard.ScoreAdded += ScoreAdded;

    MessageText.text = "Connecting to Leaderboard...";
}
With the same component, you can add new scores for current users as well, meaning a single script handles both read and write operations on the top score data.

public void AddScore(string userId, int score) {
    leaderboard.AddScore(userId, score);
}
For step-by-step instructions on incorporating this cross-platform leaderboard into your Unity game using Firebase Realtime Database, follow the instructions here. Or check out the Demo Scene to see a version of the leaderboard in action!

We want to hear from you

We have ideas for what features to add to this repository moving forward, but we want to hear from you, too! What game feature would you love to see implemented in Unity using Firebase tools? What cloud-based functionality would you like to be able to drop directly into your game? And how can we improve the Leaderboard, or other solutions as they are added? You can comment below, create feature requests and file bugs on the github repo, or join the discussion in this Google Group.

Let’s make great games together!

Hangouts Chat alerts & notifications… with asynchronous messages

Posted by Wesley Chun (@wescpy), Developer Advocate, G Suite

While most chatbots respond to user requests in a synchronous way, there are scenarios when bots don't perform actions based on an explicit user request, such as for alerts or notifications. In today's DevByte video, I'm going to show you how to send messages asynchronously to rooms or direct messages (DMs) in Hangouts Chat, the team collaboration and communication tool in G Suite.

What comes to mind when you think of a bot in a chat room? Perhaps a user wants the last quarter's European sales numbers, or maybe, they want to look up local weather or the next movie showtime. Assuming there's a bot for whatever the request is, a user will either send a direct message (DM) to that bot or @mention the bot from within a chat room. The bot then fields the request (sent to it by the Hangouts Chat service), performs any necessary magic, and responds back to the user in that "space," the generic nomenclature for a room or DM.

Our previous DevByte video for the Hangouts Chat bot framework shows developers what bots and the framework are all about as well as how to build one of these types of bots, in both Python and JavaScript. However, recognize that these bots are responding synchronously to a user request. This doesn't suffice when users want to be notified when a long-running background job has completed, when a late bus or train will be arriving soon, or when one of their servers has just gone down. Recognize that such alerts can come from a bot but also perhaps a monitoring application. In the latest episode of the G Suite Dev Show, learn how to integrate this functionality in either type of application.

From the video, you can see that alerts and notifications are "out-of-band" messages, meaning they can come in at any time. The Hangouts Chat bot framework provides several ways to send asynchronous messages to a room or DM, generically referred to as a "space." The first is the HTTP-based REST API. The other way is using what are known as "incoming webhooks."

The REST API is used by bots to send messages into a space. Since a bot will never be a human user, a Google service account is required. Once you create a service account for your Hangouts Chat bot in the developers console, you can download its credentials needed to communicate with the API. Below is a short Python sample snippet that uses the API to send a message asynchronously to a space.

from apiclient import discovery
from httplib2 import Http
from oauth2client.service_account import ServiceAccountCredentials

SCOPES = 'https://www.googleapis.com/auth/chat.bot'
creds = ServiceAccountCredentials.from_json_keyfile_name(
'svc_acct.json', SCOPES)
CHAT = discovery.build('chat', 'v1', http=creds.authorize(Http()))

room = 'spaces/<ROOM-or-DM>'
message = {'text': 'Hello world!'}
CHAT.spaces().messages().create(parent=room, body=message).execute()

The alternative to using the API with services accounts is the concept of incoming webhooks. Webhooks are a quick and easy way to send messages into any room or DM without configuring a full bot, i.e., monitoring apps. Webhooks also allow you to integrate your custom workflows, such as when a new customer is added to the corporate CRM (customer relationship management system), as well as others mentioned above. Below is a Python snippet that uses an incoming webhook to communicate into a space asynchronously.

import requests
import json

URL = 'https://chat.googleapis.com/...&thread_key=T12345'
message = {'text': 'Hello world!'}
requests.post(URL, data = json.dumps(message))

Since incoming webhooks are merely endpoints you HTTP POST to, you can even use curl to send a message to a Hangouts Chat space from the command-line:

curl \
-X POST \
-H 'Content-Type: application/json' \
'https://chat.googleapis.com/...&thread_key=T12345' \
-d '{"text": "Hello!"}'

To get started, take a look at the Hangouts Chat developer documentation, especially the specific pages linked to above. We hope this video helps you take your bot development skills to the next level by showing you how to send messages to the Hangouts Chat service asynchronously.

Understanding error budget overspend – part one – CRE life lessons



In previous CRE Life Lessons blog posts, the Google Customer Reliability Engineering (CRE) team has spent a lot of time talking about service level objectives (SLOs), which measure whether your service is meeting its reliability targets from the point of view of its end users. Your SLO lets you specify how much downtime your service can have in a given period—for example, 43 minutes every 30 days for a service that needs to be available 99.9% of the time. This downtime allowance is your error budget. Like a household budget, it’s OK to spend this error budget over those 30 days, as long as you don’t spend more than that.

If you do run out of your error budget, either by spending a bit too much each day, or by having a major outage that blows it all at once, that tells you that your service’s users are suffering too much and it’s time to give them a break. How do you do that? Here are a few questions to consider to see if you need to recalibrate your error budget.

Where are you spending your error budget?

Your SLOs will be target values for corresponding service level indicators (SLIs), which are the measurements of the critical parts of the end-user experience. One SLI for the 99.9% available example system above might be “the percent of HTTP responses which are successful (200), out of all 20x and 50x HTTP responses.” You calculate your error budget spend by the percent of the measurement period where your service fails to reach all of its SLO targets; depending on the granularity and accuracy of your SLI measurement, this might be done on a per-minute, per-hour or even per-day basis.

When you analyze your error budget spend day-by-day, you should try to attribute the main causes of error budget spend over the measurement period:
  • Do most of your errors happen when you’re doing binary releases? That implies that you’re not going to be able to keep within budget unless you do something to make releases less frequent, less error-prone or lower-impact when there is an error.
  • Are you seeing steady error spend coming from intermittent application failure, which adds up to the majority of your budget? That’s telling you that you’ve got a fundamental failure in your application. It’s a strong signal that you need to drill down in your logs to find the troublesome queries, and that you should expect to dedicate some of your engineers to identify the root causes and either address them directly or plan to fix them in your next project planning cycle.
  • Are large chunks of your error budget getting spent by major application failures, where most of your service goes down for many minutes due to configuration pushes, excessive load or queries-of-death? You need to run effective postmortems to identify the root causes and mitigate them. You will need to redirect some of your development engineering effort to address the top action items from those postmortems—so feature development and releases will naturally happen more slowly. (More on this in another post.)
  • Is the bulk of your spend coming from a dependency outside your control, such as a critical backend or your compute platform? You’ll need to address the dependency or platform owner directly, showing them your SLI metrics and negotiating about how they can make their service more reliable—or how you can be more resilient to the expected failure modes.
For each of these cases, you have an objective measurement of whether the problem has been sufficiently addressed: you will expect your SLIs to stay high in circumstances where previously they plummeted.

Are you measuring the right signal?

Something else you should consider: Did the outage reflect real user pain? If you have a strong indicator that users weren’t concerned by a major outage that spent a chunk of your error budget, then you may not have to change your development practices or architecture, but you still have something to fix. Either you should determine a new, lower target level for your SLO, or you should find a different SLI that better represents the user experience.

Can your users tolerate a slightly worse experience?

Suppose you’re trying to run your service at a 99.9% availability level, with the corresponding 43-minute-per-month error budget, but you’re consistently failing to meet that; you’re spending 50-60 minutes per month. How much does that actually matter?

You probably have business intelligence channels for measuring customer happiness in terms of time spent on your site, purchase rate, support tickets raised and other fairly direct measurements of user happiness. Evaluate those statistics against your SLIs: Are your budget overspend periods correlated with less user happiness, and if so, what’s the correlation function? If a 50% error budget overspend corresponds to a 1% decrease in customer revenue, then you may feel that you can adjust your SLO target and aim for a 99.5% availability level, rather than spend a lot of engineering effort trying to raise your availability to the original target.

What is important in this case is to have, and document, the data used to determine the SLO target. You don’t want to fall into the trap of increasing your error budget by 50% each period because “users don’t really care”—you need to articulate the tradeoff in user happiness/spend vs. reliability in your SLO definition. An SLO specification shouldn’t just contain numbers and metric names. It should also reference the logic and data used to determine the SLO target.

When your users’ experience isn’t definitive

It may be true that the customer is always right— but what if your service’s users are part of your company? In some cases, the overall business decision may be that continuing to build and release the software is in the best interest of the company as a whole, even if you’re consistently going over budget. The error budget spend may cause an inconvenience to employees, but failing to release new versions of the software would have a significant cost to the company that outweighs user inconvenience.

This can occur when there's a disconnect between what the users of the software are perceived to need (for example, the 99.9% availability target of this example service) and what the executives who pay for the development of the software think these users should tolerate in the name of greater velocity.

Now that we understand what messages an error budget is telling us, in part two of this post we will look at how best to keep a positive balance.

Interested to learn more about site reliability engineering (SRE) in practice? We’ll be discussing how to apply SRE principles to deploy and manage services at Next ‘18 in July. Join us!

Related content:

Good housekeeping for error budgets – part two – CRE life lessons



In part one of this CRE Life Lessons post, we talked about the error budget for a service and the information it tells you about how close your service is to breaching its reliability targets--the service-level objectives (SLOs). Once you’ve done some digging to understand why you may be consistently overspending your error budget, it’s time to fix the root causes of the problem.

Paying off your error budget debt

Those of us who have held a significant balance on a credit card are familiar with the large bite it can take out of a monthly household budget. Good housekeeping practice means that we should be looking to pay down as much of that debt as possible in order to shrink the monthly charge. Error budgets work the same way.

Once your error budget spend analysis identifies the major contributors to the spending rate, you should be prepared to redirect your developers’ efforts from new features to addressing those causes of spend. This might be an improved QA environment or test set to catch more release errors before they hit production, or better roll-out of automation and monitoring to detect and roll back bad releases more quickly.

The effect of this approach is likely to be that you make less frequent releases, or each release has fewer changes and hence is less likely to contain an error-budget-impacting problem. You’re slowing down release velocity temporarily in order to allow safer releasing at the original velocity in future.

Looking at downstream spend

Another issue to consider is: What if the error budget overspend wasn’t the developers’ fault? If your data center or cloud platform has a hardware outage, there’s not much the developers can do about it. Sure, your end users don’t care why the service broke, and you don’t want to make their lives worse, but it seems harsh to ding your developers for someone else’s failure. This should surface in your analysis of error budget spend, as described above.

What next? You may need to talk to the owners of that platform about their historical (measured) reliability and how it squares with you trying to run your service at your target SLO. It may be that changes are needed on both sides: You change your system to be able to detect and tolerate certain failures from them, and they improve detection and resolution time of the failure cases that impact you.

Often, a platform is not going to change significantly, so you have to decide how to account for that error spend in future. You may decide that it’s significant enough that you need to increase your resilience to it, e.g., by implementing (and exercising!) the option to fail your service automatically out of an affected cloud region over to an unaffected region. (See our “Defining SLOs for services with dependencies” blog post, which dealt with this problem in depth.)

When your releases are the problem

It could be, however, that your analysis leads you to the conclusion that software releases are a major source of your error budget spend. Certainly, our experience at Google is that binary rollouts are one of the top sources of outages; many a postmortem starts “We rolled out a new release of the software, which we thought did <X>, which our users would love, but in fact it did <Y>, which caused users to see errors in their browser/be unable to view photos of their cat/receive 100 email bounces a minute.”

The canonical response to a series of bad releases that overspend the error budget is to impose a freeze on the release of new features. This can be seen as a last-resort action; it acknowledges the existing efforts to pay down debt have not delivered sufficient reliability improvement, so lowering the rate of change is instead required to protect user experience. A freeze of this nature can also provide the space and direction to development teams to allow them to refocus their attention away from features onto reliability improvements. However, it’s a drastic step to take.

Other ways you can avoid freezing include:
  • Make an explicitly agreed-upon adjustment to the feature vs. reliability work balance. For example, your company normally does two-week sprints, where 95% of the work is feature-driven, and 5% is postmortem action items and other reliability work. You agree that while your service is out of SLO, the sprints will be instead be 50/50.
  • Overprovision your service. For instance, pay more money to replicate to another cloud zone and/or region, or run more replicas of your service to handle higher traffic loads. This is only effective if you have determined that this approach will help mitigate error budget spend.
  • Declare a reliability incident. Appoint a specific person to analyze the postmortem and error budget spend and come up with recommendations. It’s important that the business has pre-committed to prioritizing those recommendations.

Winter is coming

If you really have to impose a new features freeze, how long should it last? Generally, it should last until you have removed the overspend in your error budget, and have confidence it will not recur. We’ve seen two principal methods of error budget calculation: fixed intervals (say, each calendar month) and rolling intervals (the last N days).

If you operate a fixed interval for your error budget calculation, your reaction to an error budget overspend depends on when it happens. If it happens on day 1, you spend the whole month frozen; if it’s on day 28, you may not actually need to stop releasing because your next release may be in the next month, when the error budget is reset. Unless your customer is also sensitive to outages on a calendar month basis, this doesn’t seem to be a good practice to optimize your customers’ experience.

For a rolling 30-day error budget measurement period, your 99.9% available service gains the error budget lost in day N-30, so if your budget is 20 minutes overspent, now you need to wait until that 20 minutes of debt has dropped off your radar. So if you spent 15 minutes of your budget on day N-29 and five minutes on day N-28, you’d need to wait two more days to get back to a positive balance, assuming no further outages. In practice, you’d probably wait until you accumulate a buffer of 20% of your error budget so you are resilient to minor unexpected spends.

Following this guidance, if you have a major outage that spends your entire month’s budget in one day, then you’d be frozen for an entire month. In practice, this may not be acceptable. At the very least, though, you should be drastically down-scaling your release velocity in order to have more engineering time to devote to fixing the root causes (see “Paying off your error budget debt” above). There are other approaches, too: Check out the discussion about blocking releases in an earlier episode of CRE Life Lessons, where we analyzed an example escalation policy.

As you can see, the rolling period for error budget measurement is less prone to a varying reaction depending on the particular date of an outage. We recommend that you adopt this approach if you can, though admittedly it can be challenging to accumulate this kind of data in monitoring tools currently.

The long-term costs of freezes

Freezing the release of new features isn’t free of cost. In a worst-case scenario, if your developers are continuing new feature development but not releasing those features to users, the changes will build up, and when you finally resume releases it is almost inevitable that you’re going to see a series of broken releases. We’ve seen this happen in practice: if we impose a freeze on a service over an event like Black Friday or New Year’s, we expect that the week following the freeze will be unusually busy with service failures as all the backed-up changes reach users. To avoid this, it’s important to re-emphasize to teams affected by the freeze that it is intended to provide space to focus on reliability development, not feature development.

Sometimes it’s not possible to freeze all releases. Your company may have a major event coming up, such as a conference, and so there’s a compelling need to push certain new features into production no matter what the recent experience of its users. One process you could adopt in this case is the concept of a silver bullet: The product management team has a (very limited) right to override a release freeze to deploy a critical feature. To make this approach work well, that right needs to be expensive to exercise and limited in frequency: The spend of a silver bullet should be regarded as a failure, and require a postmortem to analyze how it came about and how to mitigate the risk of it happening again.

Using the error budget to your (and your users’) advantage

An error budget is a crucial concept when you’re taking a principled approach to service reliability. Like a household budget, it’s there for you (the service owner) to spend, and it's important for the service stakeholders to agree on what should happen when you overspend it ahead of doing so. If you find you’ve overspent, a feature freeze can be an effective tool to prioritize development time toward reliability improvements. But remember that reflexively freezing your releases when you blow through your error budget isn’t always the appropriate response. Consider where your budget is being spent, how to reduce the major sources of spend and whether some loosening of the purse strings is in order. The most important principle: Do it based on data!

Interested to learn more about site reliability engineering (SRE) in practice? We’ll be discussing how to apply SRE principles to deploy and manage services at Next ‘18 in July. Join us!

Related content:

Why we believe in an open cloud



Open clouds matter more now than ever. While most companies today use a single public cloud provider in addition to their on-premises environment, research shows that most companies will likely adopt multiple public and private clouds in the coming years. In fact, according to a 2018 Rightscale study, 81-percent of enterprises with 1,000 or more employees have a multi-cloud strategy, and if you consider SaaS, most organizations are doing multi-cloud already.

Open clouds let customers freely choose which combination of services and providers will best meet their needs over time. Open clouds let customers orchestrate their infrastructure effectively across hybrid-cloud environments.

We believe in three principles for an open cloud:
  1. Open is about the power to pick up an app and move it—to and from on-premises, our cloud, or another cloud—at any time.
  2. Open-source software permits a richness of thought and continuous feedback loop with users.
  3. Open APIs preserve everyone’s ability to build on each other’s work.

1. Open is about the power to pick up an app and move it

An open cloud is grounded in a belief that being tied to a particular cloud shouldn’t get in the way of achieving your goals. An open cloud embraces the idea that the power to deliver your apps to different clouds while using a common development and operations approach will help you meet whatever your priority is at any given time—whether that’s making the most of skills shared widely across your teams or rapidly accelerating innovation. Open source is an enabler of open clouds because open source in the cloud preserves your control over where you deploy your IT investments. For example, customers are using Kubernetes to manage containers and TensorFlow to build machine learning models on-premises and on multiple clouds.

2. Open-source software permits a richness of thought and continuous feedback loop with users

Through the continuous feedback loop with users, open source software (OSS) results in better software, faster, and requires substantial time and investment on the part of the people and companies leading open source projects. Here are examples of Google’s commitment to OSS and the varying levels of work required:
  • OSS such as Android, has an open code base and development is the sole responsibility of one organization.
  • OSS with community-driven changes such as TensorFlow, involves coordination between many companies and individuals.
  • OSS with community-driven strategy, for example collaboration with the Linux Foundation and Kubernetes community, involves collaborative, decision-making and accepting consensus over control.
Open source is so important to Google that we call it out twice in our corporate philosophies, and we encourage employees, and in fact all developers, to engage with open source.

Using BigQuery to analyze GHarchive.org data, we found that in 2017, over 5,500 Googlers submitted code to nearly 26,000 repositories, created over 215,000 pull requests, and engaged with countless communities through almost 450,000 comments. A comparative analysis of Google’s contribution to open source provides a useful relative position of the leading companies in open source based on normalized data.

Googlers are active contributors to popular projects you may have heard of including Linux, LLVM, Samba, and Git.

Google regularly open-sources internal projects

Top Google-initiated projects include:

3. Open APIs preserve everyone’s ability to build on each other’s work

Open APIs preserve everyone’s ability to build on each other’s work, improving software iteratively and collaboratively. Open APIs empower companies and individual developers to change service providers at will. Peer-reviewed research shows that open APIs drive faster innovation across the industry and in any given ecosystem. Open APIs depend on the right to reuse established APIs by creating independent-yet-compatible implementations. Google is committed to supporting open APIs via our membership in the Open API Initiative, involvement in the Open API specification, support of gRPC, via Cloud Bigtable compatibility with the HBase API, Cloud Spanner and BigQuery compatibility with SQL:2011 (with extensions), and Cloud Storage compatibility with shared APIs.

Build an open cloud with us

If you believe in an open cloud like we do, we’d love your participation. You can help by contributing to and using open source libraries, and asking your infrastructure and cloud vendors what they’re doing to keep workloads free from lock-in. We believe open ecosystems grow the fastest and are more resilient and adaptable in the face of change. Like you, we’re in it for the long-term.



It’s worth noting that not all Google’s products will be open in every way at every stage of their life cycle. Openness is less of an absolute and more of a mindset when conducting business in general. You can, however, expect Google Cloud to continue investing in openness across our products over time, to contribute to open source projects, and to open source some of our internal projects.

If you believe open clouds are an important part of making this multi-cloud world a place in which everyone can thrive, we encourage you to check out our new open cloud website where we offer more detailed definitions and examples of the terms, concepts, and ideas we’ve discussed here: cloud.google.com/opencloud.

Announcing MongoDB Atlas free tier on GCP



Earlier this year, in response to strong customer demand, we announced that we were expanding region support for MongoDB Atlas. The MongoDB NoSQL database is hugely popular, and the MongoDB Atlas cloud version makes it easy to manage on Google Cloud Platform (GCP). We heard great feedback from users, so we’re further lowering the barrier to get started on MongoDB Atlas and GCP.

We’re pleased to announce that as of today, MongoDB will offer a free tier of MongoDB Atlas on GCP in three supported regions, strategically located in North America, Europe and Asia Pacific in recognition of our wide user install base.

The free tier will allow developers a no-cost sandbox environment for MongoDB Atlas on GCP. You can test any potential MongoDB workloads on the free tier and decide to upgrade to a larger paid Atlas cluster once you have confidence in our cloud products and performance.

As of today, these specific regions are supported by the Atlas free tier:
  1. Iowa (us-central1)
  2. Belgium (europe-west1)
  3. Singapore (asia-southeast1)
To get started, you’ll just need to log in to your MongoDB console, select “Build a New Cluster,” pick “Google Cloud Platform,” and look for the “Free Tier Available” message. The free tier utilizes MongoDB’s M0 instances. An M0 cluster is a sandbox MongoDB environment for prototyping and early development with 512MB of storage space. It also comes with strong enterprise features such as always-on authentication, end-to-end encryption and high availability, as well as monitoring. Happy experimenting!

Related content:

New Cloud Filestore service brings GCP users high-performance file storage



As we celebrate the upcoming Los Angeles region for Google Cloud Platform (GCP) in one of the creative centers of the world, we’re really excited about helping you bring your creative visions to life. At Google, we want to empower artist collaboration and creation with high-performance cloud technology. We know folks need to create, read and write large files with low latency. We also know that film studios and production shops are always looking to render movies and create CGI images faster and more efficiently. So alongside our LA region launch, we’re pleased to enable these creative projects by bringing file storage capabilities to GCP for the first time with Cloud Filestore.

Cloud Filestorebeta is managed file storage for applications that require a file system interface and a shared file system. It gives users a simple, integrated, native experience for standing up fully managed network-attached storage (NAS) with their Google Compute Engine and Kubernetes Engine instances.

We’re pleased to add Cloud Filestore to the GCP storage portfolio because it enables native platform support for a broad range of enterprise applications that depend on a shared file system.


Cloud Filestore will be available as a storage option in the GCP console
We're especially excited about the high performance that Cloud Filestore offers to applications that require high throughput, low latency and high IOPS. Applications such as content management systems, website hosting, render farms and virtual workstations for artists typically require low-latency file operations, high-performance random I/O, and high throughput and performance for metadata-intensive operations. We’ve heard from some of our early users that they’ve saved time serving up websites with Cloud Filestore, cut down on hardware needs and sped up the compute-intensive process of rendering a movie.

Putting Cloud Filestore into practice

For organizations with lots of rich unstructured content, Cloud Filestore is a good place to keep it. For example, graphic design, video and image editing, and other media workflows use files as an input and files as the output. Filestore also helps creators access shared storage to manipulate and produce these types of large files. If you’re a web developer creating websites and blogs that serve file content to your audience, you’ll find it easy to integrate Cloud Filestore with web software like Wordpress. That’s what Jellyfish did.

Jellyfish is a boutique marketing agency focused on delivering high-performance marketing services to their global clients. A major part of that service is delivering a modern and flexible digital web presence.

“Wordpress hosts 30% of the world’s websites, so delivering a highly available and high performance Wordpress solution for our clients is critical to our business. Cloud Filestore enabled us to simply and natively integrate Wordpress on Kubernetes Engine , and take advantage of the flexibility that will provide our team.”
- Ashley Maloney, Lead DevOps Engineer at Jellyfish Online Marketing
Cloud Filestore also provides the reliability and consistency that latency-sensitive workloads need. One example is fuzzing, the process of running millions of permutations to identify security vulnerabilities in code. At Google, ClusterFuzz is the distributed fuzzing infrastructure behind Chrome and OSS-Fuzz that’s built for fuzzing at scale. The ClusterFuzz team needed a shared storage platform to store the millions of files that are used as input for fuzzing mutations.
“We focus on simplicity that helps us scale. Having grown from a hundred VMs to tens of thousands of VMs, we appreciate technology that is efficient, reliable, requires little to no configuration and scales seamlessly without management. It took one premium Filestore instance to support a workload that previously required 16 powerful servers. That frees us to focus on making Chrome and OSS safer and more reliable.”
- Abhishek Arya, Information Security Engineer, Google Chrome
Write once and read many is another type of workload where consistency and reliability are critical. At ever.ai, they’re training an advanced facial recognition platform on 12 billion photos and videos for tens of millions of users in 95 countries. The team constantly needs to share large amounts of data between many servers that will be written once but read a bunch. They faced a challenge in writing this data to a non-POSIX object storage, reading from which required custom code or to download the data. So they turned to Cloud Filestore.
“Cloud Filestore was easy to provision and mount, and reliable for the kind of workload we have. Having a POSIX file system that we can mount and use directly helps us speed-read our files, especially on new machines. We can also use the normal I/O features of any language and don’t have to use a specific SDK to use an object store."
- Charlie Rice, Chief Technology Officer, ever.ai
Cloud Filestore is also particularly helpful with rendering requirements. Rendering is the process by which media production companies create computer-generated images by running specialized imaging software to create one or more frames of a movie. We’ve just announced our newest GCP region in Los Angeles, where we expect there are more than a few of you visual effects artists and designers who can use Cloud Filestore. Let’s take a closer look at an example rendering workflow so you can see how Cloud Filestore can read and write data for this specialized purpose without tying up on-site hardware.

Using Cloud Filestore for rendering

When you render a movie, the rendering job typically runs across fleets ("render farms") of compute machines, all of which mount a shared file system. Chances are you’re doing this with on-premises machines and on-premises files, but with Cloud Filestore you now have a cloud option.

To get started, create a Cloud Filestore instance, and seed it with the 3D models and raw footage for the render. Set up your Compute Engine instance templates to mount the Cloud Filestore instance. Once that's set, spin up your render farm with however many nodes you need, and kick off your rendering job. The render nodes all concurrently read the same source data set from the Network File System (NFS) share, perform the rendering computations and write the output artifacts back to the share. Finally, your reassembly process reads the artifacts from Cloud Filestore and assembles it and writes into the final form.

Cloud Filestore Price and Performance

We offer two price-for-performance tiers. The high-performance Premium tier is $0.30 per GB per month, and the midrange performance Standard tier is $0.20 per GB per month in us-east1, us-central1, and us-west1 (Other regions vary). To keep your bill simple and predictable, we charge for provisioned capacity. You can resize on demand without downtime to a max of 64TB*. We do not charge per-operation fees. Networking is free in the same zone, and cross zone standard egress networking charges apply.

Cloud Filestore Premium instance throughput is designed to provide up to 700 MB/s and 30,000 IOPS for reads, regardless of the Cloud Filestore instance capacity. Standard instances are lower priced and performance scales with capacity, hitting peak performance at 10TB and above. A simple performance model makes it easier to predict costs and optimize configurations. High performance means your applications run faster. As you can see in the image below, the Cloud Filestore Premium tier outperforms the design goal with the specified benchmarks, based on performance testing we completed in-house.

Trying Cloud Filestore for yourself

Cloud Filestore will release into beta next month. To sign up to be notified about the beta release, complete this request form. Visit our Filestore page to learn more.

In addition to our new Cloud Filestore offering, we partner with many file storage providers to meet all of your file needs. We recently announced NetApp Cloud Volumes for GCP and you can find other partner solutions in our launcher.

If you’re interested in learning more about file storage from Google, check out this session at Next 2018 next month. For more information, and to register, visit the Next ‘18 website.

Bust a move with Transfer Appliance, now generally available in U.S.



As we celebrate the upcoming Los Angeles Google Cloud Platform (GCP) region in one of the creative centers of the world, we are excited to share news about a product that can help you get your data there as fast as possible. Google Transfer Appliance is now generally available in the U.S., with a few new features that will simplify moving data to Google Cloud Storage. Customers have been using Transfer Appliance for almost a year, and we’ve heard great feedback.

The Transfer Appliance is a high-capacity server that lets you transfer large amounts of data to GCP, quickly and securely. It’s recommended if you’re moving more than 20TB of data, or data that would take more than a week to upload.

You can now request a Transfer Appliance directly from your Google Cloud Platform console. Indicate the amount of data you’re looking to transfer, and our team will help you choose the version that is the best fit for your needs.

The service comes in two configurations: 100TB or 480TB of raw storage capacity. We see typical data compression rates of 2x the raw capacity. The 100TB model is priced at $300, plus express shipping (approximately $500); the 480TB model is priced at $1,800, plus shipping (approximately $900).

You can mount Transfer Appliance as an NFS volume, making it easy to drag and drop files, or rsync, from your current NAS to the appliance. This feature simplifies the transfer of file-based content to Cloud Storage, and helps our migration partners expedite the move for customers.
"SADA Systems provides expert cloud consultation and technical services, helping customers get the most out of their Google Cloud investment. We found Transfer Appliance helps us transition the customer to the cloud faster and more efficiently by providing a secure data transfer strategy."
-Simon Margolis, Director of Cloud Platform, SADA Systems
Transfer Appliance can also help you transition your backup workflow to the cloud quickly. To do that, move the bulk of your current backup data offline using Transfer Appliance, and then incrementally back up to GCP over the network from there. Partners like Commvault can help you do this.

With this release, you’ll also find a more visible end-to-end integrity check, so you can be confident that every bit was transferred as is, and have peace of mind in deleting source data.

Transfer Appliance in action

In developing Transfer Appliance, we built a device designed for the data center, so it slides into a standard 19” rack. That has been a positive experience for our early customers, even those with floating data centers (yes, actually floating--see below for more).

We’ve seen our customers successfully use Transfer Appliance for the following use cases:
  • Migrate your data center (or parts of it) to the cloud.
  • Kick-start your ML or analytics project by transferring test data and staging it quickly.
  • Move large archives of content like creative libraries, videos, images, regulatory or backup data to Cloud Storage.
  • Collect data from research bodies or data providers and move it to Google Cloud for analysis.
We’ve heard about lots of innovative, interesting data projects powered by Transfer Appliance. Here are a few of them.

One early adopter, Schmidt Ocean Institute, is a private non-profit foundation that combines advanced science with state-of-the-art technology to achieve lasting results in ocean research. Their goals are to catalyze sharing of information and to communicate this knowledge to audiences around the world. For example, the Schmidt Ocean Institute owns and operates research vessel Falkor, the first oceanographic research vessel with a high-performance cloud computing system installed onboard. Scientists run models and software and can plan missions in near-real time while at sea. With the state-of-the-art technologies onboard, scientists contribute scientific data to the oceanographic community at large, very quickly. Schmidt Ocean Institute uses Transfer Appliance to safely get the data back to shore and publicly available to the research community as fast as possible.

“We needed a way to simplify the manual and complex process of copying, transporting and mailing hard drives of research data, as well as making it available to the scientific community as quickly as possible. We are able to mount the Transfer Appliance onboard to store the large amounts of data that result from our research expeditions and easily transfer it to Google Cloud Storage post-cruise. Once the data is in Google Cloud Storage, it’s easy to disseminate research data quickly to the community.”
-Allison Miller, Research Program Manager, Schmidt Ocean Institute

Beatport, a division of LiveStyle, serves an audience of electronic music DJs, producers and their fans. Google Transfer Appliance afforded Beatport the opportunity to rethink their storage architecture in the cloud without affecting their customer-facing network in the process.

“DJs, music producers and fans all rely on Beatport as the home for the world’s electronic music. By moving our library to Google Cloud Storage, we can access our audio data with the advanced tools that Google Cloud Platform has to offer. Managing tens of millions of lossless quality files poses unique challenges. Migrating to the highly performant Cloud Storage puts our wealth of audio data instantly at the fingertips of our technology team. Transfer Appliance made that move easier for our team.”
-Jonathan Steffen, CIO, beatport
Eleven Inc. creates content, brand experiences and customer activation strategies for clients across the globe. Through years of work for their clients, Eleven built a large library of creative digital assets and wanted a way to cost-effectively store that data in the cloud. Facing ISP network constraints and a desire to free up space on their local asset server quickly, Eleven Inc. used Transfer Appliance to facilitate their migration.

“Working with Transfer Appliance was a smooth experience. Rack, capture and ship. And now that our creative library is in Google Cloud Storage, it's much easier to think about ways to more efficiently manage the data throughout its life-cycle.”
-Joe Mitchell, Director of Information Systems
amplified ai combines extensive IP industry experience with deep learning to offer instant patent intelligence to inventors and attorneys. This requires a lot of patent data for building models. Transfer Appliance helped amplified ai move TBs of this specialized essential data to the cloud quickly.

“My hands are already full building deep learning models on massive, disparate data without also needing to worry about physically moving data around. Transfer Appliance was easy to understand, easy to install, and made it easy to capture and transfer data. It just did what it was supposed to do and saved me time which, for a busy startup, is the most valuable asset.”
-Chris Grainger, Founder & CTO, amplified ai
Airbus Defence and Space Geo Inc. uses their exclusive access to radar and optical satellites to offer a stunning Earth observation images library. As part of a major cloud migration effort, Airbus moved hundreds of TBs of this data to the cloud with Transfer Appliance so they can better serve images to clients from Cloud Storage. They improved data quality along with the migration by using Transfer Appliance.

“We needed to liberate. To flex on demand and scale in the cloud, and unleash our creativity. Transfer Appliance was a catalyst for that. In addition to migrating an amount of data that would not have been possible over the network, this transfer gave us the opportunity to improve our storage in the process—to clean out the clutter.”
-Dave Wright, CTO, Airbus Defense and Space Geo Inc.


National Collegiate Sports Archives (NCSA) is the creator and owner of the VAULT, which contains years worth of college sports footage. NCSA digitizes archival sports footage from leading schools and delivers it via mobile, advertising and social media platforms. With a lot of precious footage to deliver to college sports fans around the globe, NCSA needed a way to move data into Google Cloud Platform quickly and with zero disruption for their users.

“With a huge archive of collegiate sports moments, we wanted to get that content into the cloud and do it in a way that provides value to the business. I was looking for a solution that would cost-effectively, simply and safely execute the transfer and let our teams focus on improving the experience for our users. Transfer Appliance made it simple to capture data in our data center and ship it to Google Cloud. ”
-Jody Smith, Technology Lead, NCSA

Tackle your data migration needs with Transfer Appliance

To get detailed information on Transfer Appliance, check out our documentation. And visit our Data Transfer page to learn more about our other cloud data transfer options.

We’re looking forward to bringing Transfer Appliance to regions outside of the U.S. in the coming months. But we need your help: Where should we deploy first? If you are interested in offline data transfer but not located in the U.S., please indicate so in the request form.

If you’re interested in learning more about cloud data migration strategies, check out this session at Next 2018 next month. For more information, and to register, visit the Next ‘18 website.

Google Cloud for Electronic Design Automation: new partners



A popular enterprise use case for Google Cloud is electronic design automation (EDA)—designing electronic systems such as integrated circuits and printed circuit boards. EDA workloads, like simulations and field solvers, can be incredibly computationally intensive. They may require a few thousand CPUs, sometimes even a few hundred thousand CPUs, but only for the duration of the run. Instead of building up massive server farms that are oversubscribed during peak times and sit idle for the rest of the time, you can use Google Cloud Platform (GCP) compute and storage resources to implement large-scale modeling and simulation grids.

Our partnerships with software and service providers make Google Cloud an even stronger platform for EDA. These solutions deliver elastic infrastructure and improved time-to-market for customers like eSilicon, as described here.

Scalable simulation capacity on GCP provided by Metrics Technologies (more details below)

This week at Design Automation Conference, we’re showcasing a first-of-its-kind implementation of EDA in the cloud: our implementation of the Synopsys VCS simulation solution for internal EDA workloads on Google Cloud, by the Google Hardware Engineering team. We also have several new partnerships to help you achieve operational and engineering excellence through cloud computing, including:

  • Metrics Technologies is the first EDA platform provider of cloud-based SystemVerilog simulation and verification management, accelerating the move of semiconductor verification workloads into the cloud. The Metrics Cloud Simulator and Verification Manager, a pay-by-the-minute software-as-a-service (SaaS) solution built entirely on GCP, improves resource utilization and engineering productivity, and can scale capacity with variable demand. Simulation resources are dynamically adjusted up or down by the minute without the need to purchase additional hardware or licenses, or manage disk space. You can find Metrics news and reviews at www.metrics/news.ca, or schedule a demo at DAC 2018 at www.metrics.ca.
  • Elastifile delivers enterprise-grade, scalable file storage on Google Cloud. Powered by a high-performance, POSIX-compliant distributed file system with integrated object tiering, Elastifile simplifies storage and data management for EDA workflows. Deployable in minutes via Google Cloud Launcher, Elastifile enables cloud-accelerated circuit design and verification, with no changes required to existing tools and scripts.
  • NetApp is a leading provider of high-performance storage solutions. NetApp is launching Cloud Volumes for Google Cloud Platform, which is currently available in Private Preview. With NetApp Cloud Volumes, GCP customers have access to a fully-managed, familiar file storage (NFS) service with a cloud native experience.
  • Quobyte provides a parallel, distributed, POSIX-compatible file system that runs on GCP and on-premises to provide petabytes of storage and millions of IOPS. As a distributed file system, Quobyte scales IOPS and throughput linearly with the number of nodes–avoiding the performance bottlenecks of clustered or single filer solutions. You can try Quobyte today on the Cloud Launcher Marketplace.
If you’d like to learn more about EDA offerings on Google Cloud, we encourage you to visit us at booth 1251 at DAC 2018. And if you’re interested in learning more about how our Hardware Engineering team’s used Synopsys VCS on Google Cloud for internal Google workloads, please stop by Design Infrastructure Alley on Tuesday for a talk by team members Richard Ho and Ravi Rajamani. Hope to see you there!

Protect your Compute Engine resources with keys managed in Cloud Key Management Service



In Google Cloud Platform, customer data stored at rest is always encrypted by default using multiple layers of encryption technology. We also offer a continuum of encryption key management options to help meet the security requirements of your organization.

Did you know there is now beta functionality you can use to further increase protection of your Compute Engine disks, images and snapshots using your own encryption keys stored and managed with Cloud Key Management Service (KMS)? These customer-managed encryption keys (CMEKs) provide you with granular control over which disks, images and snapshots will be encrypted.

You can see below that on one end of the spectrum, Compute Engine automatically encrypts disks and manages keys on your behalf. On the other end, you can continue using your customer-supplied encryption keys (CSEK) for your most sensitive or regulated workloads, if you desire.

This feature helps you strike a balance between ease of use and control: Your keys are in the cloud, but still under your management. This option is ideal for organizations in regulated industries that need to meet strict requirements for data protection, but which don’t want to deal with the overhead of creating custom solutions to generate and store keys, manage and record key access, etc.

Setting up CMEK in Compute Engine helps quickly deliver peace of mind to these organizations, because they control access to the disk by virtue of controlling the key.

How to create a CMEK-supported disk

Getting started with the CMEK feature is easy. Follow the steps below to create and attach a Compute Engine disk that is encrypted with a key that you control.

You’ll need to create a key ring and key in KMS. This—and all the rest of the steps below—can be accomplished in several ways: through the Developer Console, APIs and gcloud. In this tutorial, we’ll be using the developer console. We’ll start on the Cryptographic Keys page, where we’ll select “Create Key Ring.”

Give your keyring a name. Do the same with the key the next page. For this tutorial, feel free to leave all the other fields as-is.

Having finished those steps, you now have a keyring with a single AES-256 encryption key. In the screenshot below, you can see it as “tutorial-keyring-1.” And since the keyring is managed by KMS, it is already fully integrated with Cloud Identity and Access Management (IAM) and Cloud Audit Logging, so you can easily manage permissions and monitor how it’s used.

With the key in place, you can start encrypting disks with CMEK keys. The instructions below are for creating a new instance and protecting its boot disk with a CMEK key. Note that it is also possible to create new encrypted disks from snapshots and images and attach them to existing VMs, or even to encrypt the underlying snapshots and images themselves.

First we’ll go to the VM instances page, and create a new VM instance.

On the instance creation page, expand the “Management, Disks, Networking and SSH keys” section and go to the “Disks” tab. There, you’ll see options for the three different encryption options described above. Select “Custom-managed key” and select the appropriate key from the dropdown menu.

Note that if this is your first time doing this, you may see the following dialogue. This is expected - you’ll need to grant this service account permissions. In turn, this service account is used by Compute Engine to do the actual encryption and decryption of disks, images and snapshots.

Once you’ve done this, confirm the VM creation by selecting “Create”.

And there you have it! With a few easy steps, you can create a key in Cloud KMS, encrypt a disk with the key and mount it to a VM. Since you manage the key, you can choose at any time to suspend or delete the key. If that happens, resources protected by that key won’t start until key access is restored.

Try CMEKs for yourself

Visit the Developer Console and start using your CMEKs to help secure your Compute Engine disks, images and snapshots. As always, your feedback is invaluable to us, and we’d love to hear what you think. Safe computing!